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Abstract: In the note we consider an iterative generalisation of the rejection samphng 
^ ' method. In high energy physics, this samphng is frequently used for event generation, i.e. 

^ ■ preparation of phase space points distributed according to a matrix element squared |Mp 

QQ . for a scattering process. In many realistic cases |Mp is a complicated multi-dimensional 

^N \ function, so, the standard von Neumann procedure has quite low efficiency, even if an 

r^ \ error reducing technique, like VEGAS, is applied. As a result of that, many of the |Mp 

QQ \ calculations go to "waste". The considered iterative modification of the procedure can 

^^ ' extract more "unweighted" events, i.e. distributed according to |Mp. In several simple 

examples we show practical benefits of the technique and obtain more events than the 
^ ■ standard von Neumann method, without any extra calculations of |Mp. 
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1. Introduction 

Simulated events of particle scattering are one of the main tools both in modern theoretical 
research and in current experimental analyses in the high energy physics (HEP). Usually 
the events are prepared by means of Monte Carlo techniques. The reason for the success 
of the methods is very clear: current simulation problems require realistic events of the 
processes 2 — > A^, where N is set by the accelerator energy and is equal to several hundreds 
for the LHC, for instance. But even applied approximations such as partons showers and 
zero width decays for the final particles leave us with A^ ~ 3 — 10. This large number 
corresponds to a 5 — 25 dimension phase space. Owing to the large number of dimensions 
and complexity of the investigated functions - the matrix elements squared - we cannot 
rely on deterministic grid methods due to the well-known "curse of dimensionality" . 

The standard scheme applied is the following: at first we prepare a matrix element 
squared symbolically (by hand or automatically), construct an importance sampling func- 
tion, again manually or automatically by applying, for example, VEGAS [|l|; obtain an 
estimation of the total integral (the total cross section), for chosen cuts and parameters, 
by the classic Monte-Carlo method, find the function maximum (or an array of maxima, 
since stratified sampling may be worth applying), generate enough weighted events for the 
function, and perform the "hit-and-miss" von Neumann procedure [^. The final step maps 
the weighted points, distributed according to the importance sampling function, to a subset 
of the sample - realistic independent unweighted events distributed according to the matrix 
element squared. This is the general and oversimplified scheme of functioning of almost 
all modern Monte-Carlo generators, see for example the review Q. Useful information on 
current mathematical Monte-Carlo techniques applied in HEP can also be found in the 
short review by Weinzierl 0]. 



Although the scheme described guarantees that the unweighted events are distributed 
according to the function considered, the selection efficiency usually is not too high, espe- 
cially if the function has lots of sharp peaks. The main reason is that importance sampling 
functions are simple functions and they have much smoother behaviour and describe the 
real function behaviour in a very crude approximation. As a result we get lots of calculated 
points going into the "waste bin". Since the cost calculating of the investigate function 
can be quite large it would be worth considering whether we can extract more information 
(e.g. more unweighted events) from the "waste". The most obvious way to achieve this 
goal is to repeat the von Neumann procedure with the rejected phase space points. It is 
rather clear that these points can be used for the task, but with worse quality. Since, at 
the first rejection stage, some points have been accepted and, so, removed from the sample 
of the weighted events, we should change the weights of all remaining points. The paper 
is devoted to the construction of a procedure, which gives us the possibility to use the 
rejected points iteratively. 

In fact, the idea of using rejected points is not quite new. For example, an iterative 
von Neumann procedure for the extraction of random bits from an independent but biased 
bit array was constructed in [^. It originates from one of von Neumann's ideas published 
in his famous paper O. In this paper we generalise another procedure set out in the paper 
- rejected sampling. 

As we mentioned earlier the initial sample should be prepared with an importance 
sampling function, in order to increase selection efficiency. Since we are interested in 
an automated procedure, we will use the VEGAS algorithm, the de facto standard of 
importance sampling in HEP. It is worth noticing that VEGAS itself can be improved, as 
is described in Ohl's paper [^. For some practical and historical reasons we will apply the 
slightly adapted VEGAS algorithm from CompHEP 0. 

In Sec. |2|we give some mathematical arguments in favour of the iterative von Neumann 
procedure and formulate the method itself. In Sec. ^ we define a stopping rule for the 
method. Three model, but rather realistic, examples are considered in Sec. Q. Final 
remarks and conclusions are given in the section 0. 

2. Iterative von Neumann procedure 

We start from the standard "hit-and-miss" von Neumann procedure applied to a non- 
negative function /(x), where x = {xi, ...^Xd} is a point in the d-dimension phase space. 
For simplicity we limit ourselves to a bounded domain T) in the phase space. At first, we 
prepare a sample of A'" phase space points {xjJat, i = 0, ..., A^ — 1 in P, distributed according 
to an importance sampling distribution g{x)^ where g{x) is a positive function normalised 
with the condition J^g{x)dx = 1. So, the corresponding weight Wj of each point is equal 
to f{xi)/g{xi). Having the weights we can estimate the total integral 



1 ^-1 
/ f{x)dx ^ ifh = {f/9)s = ^ E 



where we denote an estimation of the integral with a sample of uniformly distributed points 
(•)j. We can also obtain an estimation of the accuracy given by a^ = {uj'^)g — (w)^ - the 
standard deviation. The next step is to pick out a subset of the point distributed according 
to f{x) from {xi}iy. Let us formulate the standard rejection sampling procedure^: 

• Generate a random number < ^ < 1 

• Compare Wi = uji/too, if Wi > ^ - accept the point to the final sample. 

• Repeat the procedure for all points in the sample. 

It is obvious that the final sample of accepted points is distributed according to f{x). But 
we also obtain the second sample of the rejected points. Usually the sample is much larger 
than the sample of accepted points. So, the main question of this note is whether we can 
extract something useful from analyzing the rejected points. Our answer is affirmative. 
Namely, we will construct a procedure, which "returns" the rejected point sample to (al- 
most) the initial position before the von Neumann procedure was applied. Thus, we will 
build an iterative Monte-Carlo rejection sampling. 

The possibility of the procedure is based on two observations. First, we can apply 
the "hit-and-miss" algorithm for points distributed according to any importance sampling 
function. The only conditions we require is our ability to calculate the function at each point 
in the initial sample and the positivity of the function, since g{xi) goes to denominator. 
Certainly, the acceptance efficiency will be strongly dependent on the chosen function. The 
second observation (or, strictly speaking, a limitation) is the relevance of the maximum 
value of f{x) at xq. If we take ujq for the "hit-and-miss" procedure the point will be 
accepted by construction, since the acceptance probability of the point is equal to 1.0. 
So, the rejected points will describe the region near xq worse then any other region, and 
further samples constructed at the next rejection steps will become worse and worse. Thus, 
we must formulate a simple criterion which ensures a sufficiently reasonable description of 
the region around the "maximum" in the new accepted sample and stops iteration if the 
description becomes "too poor" (see more details of the criterion in the next section). 
Therefore, our goal is twofold. First, we must construct a distribution function for the 
rejected points (we make it in this section). After that, we have to see whether the rejected 
points are still able to describe /(x) without serious drawbacks, especially in the region 
around f^ax- 

The first problem can be solved by considering the densities of points in three samples: 
a) the initial one, b) the sample of accepted points after the "hit-and-miss" procedure, and 
c) the sample of rejected points. The densities are equal to 

Pini{x) = N ■ g{x) = N ■ go{x), 

Paccix) = a ■ Nacc ■ f{x), 
Prejix) = Nrej ■ gi{x), 



^We will assume the point xo has the maximuin weight u>o — f{xo)/g{xo) in the sample. 



where we assume the importance sampUng functions are normahsed to 1.0 and a. is respon- 
sible for normahsation of f{x): a = 1/ J f{x)dx ~ 1/Itot- Nacc and Nrej are the numbers 
of accepted and rejected points correspondingly and A^ = Nacc + Nrej- Our first problem is 
to find the function gi{x). Since the samples of accepted and rejected points are obtained 
from the initial sample we can make use of the following key equation: 

Piniix) = Pacc{x) + Prej{x) (2.1) 

Substituting our expressions for the densities we obtain 

gi{xi) = {N - a ■ Nacc ■ LOi)go{Xi)/Nrej, 

where loi is the initial weight of the i-th point (for the importance sampling function 
gQ{x) = g{x)). Since the proper weights of the rejected points distributed according to 
gi{x) are f{xi)/gi{xi) we can obtain the final formula for the weight of the point x,; (if it 
has been rejected): 

_ (I-')-. , (2.2, 

1 - e • Ui/ltot 

where uji is the point weight before rejection sampling, oj^ is its weight according to the 
distribution 51 (x), and e is the acceptance efficiency for the sample. 

The first consequence of the formula is that we do not need to calculate the new 
importance sampling function gi{x) for the next iteration at all. The new weights of 
the rejected points are calculated from the old ones by means of a very simple function 
f{z) = A ■ z/{l — B ■ z). Thus, we come back to the first step and can repeat the von 
Neumann rejection procedure based on the new weights. Obviously, the second iteration 
will bring less points since larger weights became larger and vice versa. 

Accordingly the iterative Monte-Carlo "hit-and-miss" procedure has a very simple 
formulation: 

1. Prepare a sample of points {xi}N with weights uJi = f {xi) / g{x.i) and estimate Itot 

2. Perform the von Neumann procedure for the sample. 



3. Re-calculate the weights of the rejected points according the formula (|2.2| ) and find 
the point with the maximum weight. It will be the next xq. 

4. Check whether the sample satisfy a stopping condition. If not, repeat the iterations 
starting from point 2. 



It is interesting to mention that the transformation (|2.2| ) is stable during iterations. If 
we perform m iterations and accept Nad^ ...,N^c points in the iterations, then we obtain 



ijJi 



in)_ (l-6(")) 



LOi 



l-eW-Wi/Itoi' 
where e^ = EI^i ^^"^ ET=i ^al/N. 



Let's estimate the a priori number of events, which would be accepted at the (n+1)* 
iteration, if we know the information from the previous stage. In order to accept a point, 
we compare its weight with a uniformly distributed number ^. So, we can say that oJi/ojQ is 
the probability of the point to be accepted. Since the transformation ( |2.2D does not change 
the maximum point, we can write down 

/\/-{n+l) _ Ajiri) Jtot_ _ J I ^tot f.T{n) \ _ ArMn /, ,(") i /, ,W \ 

^0,rej \^0,rej / 

where A^jot , Nacc, and A'^^"- are the numbers of points in the sample before n**^ iteration, 

and the numbers of accepted and rejected points at the iteration. Wq "^j, Wq^^ ■ and cJq" ' ■ 
are the maximum weight in the whole sample before the n*'^ iteration, the maximum weight 



of rejected points only and the point weight calculated according to (|2.2| ). In the formula 
we use an approximate equality for the a priori and a posteriori acceptance efficiency at 
the n*'" iteration: Cest = hot/^o ~ ^reai = Nacc/N. 

3. The stopping rule for iterations 

The procedure constructed in the previous section lacks only one element - a reasonable 
stopping rule. In order to find the condition let us look at the formula ( |2.2| ) in more detail. 
It fails to produce sensible weights if the function /(u;) = 1 — e-uj/Itot is less than or equal to 
zero. It is clear that the worst case happens if we substitute the maximum weight ujq. Let us 
show that 1 — e-uJo/Itot = "on the average". As we pointed out above, the probability of a 
point to be accepted is equal Wj = iOi/tOQ. Since any accepted point increases the accepted 
point sample, the average number of accepted point is Nacc = Si=o Wi = N ■ Itot/^o or 
1 — e • ujQJItot = 0. So, it looks as the standard von Neumann procedure picks out all 
possible points and already the first re-weighting brings us to an inapplicable state of the 
iterative method. But, the solution of the drawback happens automatically, merely due 
to the definition of the von Neumann procedure. By construction, xq always falls within 
the accepted point sample, and for almost all rejected points the formula (|2.2| ) gives a 
finite answer. It also makes clear the source of the extra events. We effectively lower the 
normalised factor u)q -^ ^o,rej < "^O) so there is a room for more accepted events. But 
the reduction is rather smart and does not spoil the statistical sense of accepted events. 
Certainly the rejected events give a cruder estimation of the function and in realistic cases 
there exists a neighbourhood of xq where we do not have any rejected points at all. So, 
we lose some part of the total integral. As we see in the next section the total integral in 
iteration tends to decrease and, thus, further iterations give an underestimated simulation 
of our function f{x). 

Not we can formulate a natural criterion of the stopping condition for the iterative 
procedure. At the beginning we have a sample which gives us an estimation of the total 
integral with some error. It is estimated from the standard deviation calculated using the 
sample. We can continue the iterations as long as the integral estimations at the first and 



m-th steps lU and l}^ satisfy the following inequality 

I r(O) T(m)| , /o 1 A 

l^toi - -^toi I < ^0 + 0-m, (3.1) 

where cro,m are the standard deviations at the first and ?TT,-th steps. In this case, we 
still properly estimate the function f{x) at the m-th step, in general, and can extract 
points distributed according to the function. Now we are able to formulate the algorithm 
completely: 

1. Prepare a sample of points {xjIat with calculated weights w, = f{xi)/g{xi) and 
estimate Itot 

2. Perform the von Neumann procedure for the sample 

3. Re-calculate weights of the rejected points according the formula ( |2.2D and find the 
point with the maximum weight. It will be the next xq 

4. Check whether all the calculated weights are positive. If not, stop iteration 



5. Estimate the total integral value and check whether the inequality ( |3.1| ) holds true. 
If yes, repeat the iteration starting from point 2, otherwise stop. 

The final problem we come across is the question whether we can mix accepted events 
obtained in the all iterations. In fact, it is easy to prove that the accepted events have equal 
weights Itot = (/)i = {^)g{x)- The points are distributed according to g{x) = /(x)/(/), if 
we substitute the expression to the weight formula tOi = f {xi) / g{x.i) = (/). In each step in 
the algorithm we obtained events with different weights, but they are distributed within 
the sample statistical error and can be considered as statistically the same. 

4. Numerical examples 

In order to check the practicality of the method we consider three different numerical 
examples of application of the method to complicated multidimensional functions. 
The first example is a sum of two Gaussians 



h{x)=A, 



narm 



X — ai)^\ ( {x — a2 



6XP 7, — 5— + Ce • exp 



2-al J ' ^^ ^V 2-^2 



\2\ n 



(4.1) 



in a 6-dimensional unit hypercube. For calculations we chose the following values of the 
parameters: Anorm = 1-0, Ce = 729.0, ai = 0.06, as = 0.02, oi = (0.2,0.2,0.2,0.2,0.2,0.2), 
and a2 = (0.7,0.7,0.7,0.7,0.7,0.7). We chose A2 = 729.0 in order to compensate for the 
difference in the Gaussian widths. 

At first, we prepared the VEGAS grid (10 iterations of the grid customization were 
done) for the function and generated the initial sample of points; it contained approxi- 
mately 4.5 • 10^ points. After that, the iterative von Neumann procedure was carried out 
with the sample. By definition, the first iteration corresponds to the standard von Neu- 
mann selection. On the whole we made 25 iterations. No zero or negative weights were 



calculated in the iterations. Fig || displays the total integrals and cumulative numbers of 
points selected at each iteration. As we see our stopping rule ^]l] is satisfied at the 15**^ 
iteration. Fig. ^ illustrates the quality of fitting of the two obtained samples. We fit the 
histogram dNpomt/dxQ with the original function (certainly, in 1 dimension). The upper 
plot corresponds to the first iteration sample only (the standard procedure), and the dis- 
tribution for the sample of points combined for fifteen iterations is depicted in the lower 
plot. The latter sample is 6.5 times bigger then the former sample. Although the quality 
of fit for the iterative sample is slightly lower, all parameters of the original function are 
reconstructed better than for the standard sample. So, we can conclude the built iterative 
method works properly in this particular example. 
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Figure 1: Values of the total integral and the cumulative number of selected events at iterations 



of the iterative von Neumann procedure for the function (4.1) 



The second example is a combination of two Breit-Wigner peaks in the 8-dimensional 
unit hypercube. The exact formula is the following: 

1 e 



/2(x) = A 



(ai-y)2 + rf 



+ 



[0.2 



where we introduce the resonance variable Y = X^j^g 
of points in the phase space. The sample size was 2.8 



y)2 + r2 ' ^^-^^ 

Xi. Again, we prepared a sample 
• 10^ points for the function with 



parameters Anarm = 60.0, ^ = 0.167, Ti = 0.01, and r2 = 0.02. The two peaks are 
located at the points ai = 0.2 and 02 = 0.75. The resonances are taken for the sum of 
four variables Y = X]i=o -^^ ^^ order to make the function less sensitive for the VEGAS 
customization. By definition, the first iteration of the iterative von Neumann procedure 
gave the standard selection with the a posteriori efficiency e^ff = 2.57%. After that 
we repeated the procedure 16 times for the re-calculated weights (at each step). The 
total integral values and numbers of selected points are reported in Fig. 0. Owing to the 



stopping rule (3T) we should combine samples of the first 6 iterations only. This results in 
the total cumulative efficiency ee// = 4.41%. Again, we fit the histogram dNpoint/dY with 
the original formula^. We see the iterative sample results in a better fit, i.e a smaller x^ 
and more precise values of the function parameters. 

The third example represents a non-symmetric function in a 20-dimensional hypercube: 

A 

•^2(^ = 7 — - — ^- 4.3) 

(ai + xoY" 

Since xq > in the hypercube and the function approaches infinity at xq = — ai, we 
simulate the right-hand slope of the function, if ai > 0. We chose the following values of 
the function parameters: Anorm = 10~^^ and a = 10^'^. The initial sample had 2.23 • 10^ 
points. The first (standard) iteration gave the a posteriori efficiency eg// = 3.29%. We 
repeated the selection ten times and combined selected points in the final samples for 
the five iterations, according to our sopping rule. The final efficiency is equal to 7.28%. 
The total integral values and numbers of selected points are reported in Fig. |5|. Since we 
cannot improve the accuracy of Anorm by our method, we are interested in the quality of 
the reconstruction of a only. We fit the histogram dNpomt/dxo with the original formula. 
Again, we see that the iterative sample results in smaller x^ and more precise value of a 
(see Fig. H). 

5. Discussion and Conclusions 

A simple generalisation of the standard von Neumann procedure has been constructed. The 
method is based on the observation that in realistic Monte-Carlo calculations we always 
have an uncertainty in the estimation of the total integral. This means we can repeat 
the standard von Neumann rejection sampling several times until we are within limits of 
the uncertainty. As soon as we exceed the limit the final sample will start to give worse 
results. Since we exclude some points in the rejection sampling we must rearrange weights 
of points in the rejected sample. We found that the transformation is universal (it does 
not depend on the function considered) and extremely simple - the key formula ( |2.2| ). The 
second obstacle which can prevent iteration is negative weights of some points. This can 



Strictly speaking, for the fitting procedure we must multiply the expression (4.2) by Y since we 
prepared the sample parametrised with {xq, ...X7}, but the resonance variable is the combination Y — 
X^i^o ^'' If "*^6 transform the initial parametrisation into the most natural one, which contains Y, integration 
over the other re-defined variables gives us an extra factor y^~^ in the numerator. In our case, N is equal 
to 4. 



happen especially in the case of very low selection efficiency. So, we formulated a stopping 
rule for the iterative procedure, which permits one to stay within a well-defined statistical 
interpretation of selected points and to combine all samples obtained in the iterations into 
one. 

It is worth stressing that, by definition of our method, we cannot improve the accuracy 
of the total integral or, in other words, the quality of the reconstruction of the total 
normalization constant of a function under consideration. Moreover, since we select points 
with larger weights, the rejected point sample gives a biased estimation of the total integral. 
Of course we can prevent bad behaviour of the iteration process by application of the 
stopping rule ( ^.l]) . 

Three numerical examples show the procedure can really bring practical benefits in 
calculations. Owing to the simplicity of the procedure it can have a very broad area of 
application. In fact, almost every sampler, satisfying the condition that we should be able 
to prepare a set of independently generated points with weights, can be supplemented with 
a code producing more points iteratively. The method can be easily generalised to stratified 
sampling, since it can be applied independently in each stratum. 

As the first realistic application we are going to implement the method in the Monte- 
Carlo generators CompHEP [^ and Herwig-|--|- Q. Certainly, the method can be adopted 
by other popular codes (e.g. PYTHIA, MadGraph, Alpgen, Sherpa ) [|lO|. 
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Figure 2: The dNpomt/dxo distribution for the example (KjJ) for two approaches, the standard 
von Neumann procedure (upper plot) and the iterative von Neumann procedure (lower plot). Both 
curves are fitted with the original function (for 1 dimension). It is worth noticing that i^ — v^, 
since this is an one-dimensional histogram. — 11 — 
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Figure 3: Values of the total integral and the cumulative number of selected events at iterations 



of the iterative von Neumann procedure for the function (4.2) 
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Standard von Neumann procedure 
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Figure 4: The dNpoint/dY distribution for the example ( |4.2| ) fitted for two approaches, the stan- 
dard von Neumann procedure (upper plot) and the iterative von Neumann procedure (lower plot). 
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Figure 5: Values of the total integral and the cumulative number of selected events at iterations 



of the iterative von Neumann procedure for the function (4.2) 
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Figure 6: The dNpoint/dxQ distribution for the example (13) fitted for two approaches, the stan- 
dard von Neumann procedure (upper plot) and the iterative von Neumann procedure (lower plot). 
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